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In this paper we study the evolution of the mutation rate 
for simple organisms in dynamic environments. A model with 
multiple fitness coding loci tracking a moving fitness peak is 
developed and an analytical expression for the optimal muta- 
tion rate is derived. Surprisingly it turns out that the opti- 
mal mutation rate per genome is approximately independent 
of genome length, something that also has been observed in 
nature. Simulations confirm the theoretical predictions. We 
also suggest an explanation for the difference in mutation fre- 
quency between RNA and DNA based organisms. 

I. INTRODUCTION 

In any given environment the vast majority of muta- 
tions that have any effect on the fitness of a biological 
organism are deleterious. One might expect the damag- 
ing effect of non-zero mutation rates to imply that when 
under evolutionary control the lowest mutation rate com- 
patible with physiological constraints should be selected 
for. However, when examined experimentally bacteria 
and viruses (and indeed all organisms) have significant 
non-zero rate, the magnitude and diversity of which have 
failed to find satisfactory theoretical explanation. Some 
results from a number of experiments measuring the mu- 
tation rates of a selection of small DNA-based organisms 
are shown in Table fl 
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per genome no) in DNA-based microbes with different 
genome lengths v. (Data reproduced from Drake et al. |l|) 



Despite the huge variation in genome length over four 
orders of magnitude the mutation rate per genome and 
replication (1q remains constant roughly within a fac- 
tor of roughly 2 (which is at the same level as the esti- 
mated accuracy of the figures) . As pointed out by Drake 
and others ||jUa| this constancy in \iq is surprising since 
DNA/RNA repair and transcription are primarily local 
processes that act on individual bases. Thus the data 
strongly suggest that point mutation rates for the differ- 
ent organisms have evolved towards individual optimal 
values that result in almost constant genomic copying 
fidelity. 

In this paper we develop a model of the evolution of 
mutation rates based on changing environments. The 
evolved point mutation rate of this model scales so that 
the genomic copying fidelity is approximately indepen- 
dent of genome length and insensitive to other parame- 
ters in the model. The evolved mutation rates are also of 
the same magnitude as observed in Table [| for biologically 
plausible parameter settings. We also suggest a possible 
explanation for the high mutation rates of RNA viruses. 
Simulations confirm the predictions of the model. 

II. EVOLVING MUTATION RATES 

It is impossible to perfectly maintain and copy genetic 
information. All molecules, including DNA and RNA are 
thermodynamically unstable, and their physical struc- 
ture and hence the information they encode changes over 
time. In addition the binding sites of enzymes such as 
DNA polymerase are not perfectly specific and errors will 
be introduced during replication. Lowering the error rate 
requires the use of increasingly complex proof-reading 
and repair mechanisms, all of which ultimately impose 
an energetic, and hence fitness, cost on the organism. We 
can expect a balance to develop between the pressure to 
lower mutation rates due to the fitness cost of deleterious 
mutants and the physiological cost of high copying accu- 
racy ^-^] . Such a balance certainly provides an ultimate 
lower limit to the mutation rate of all organisms but ex- 
plaining the concstancy in genomic copying fidelity using 
such arguments causes unnatural assumptions on the re- 
lation between cost of local copying fidelity and genome 
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length. There is also little experimental evidence that 
mutation rates are actually determined by such a bal- 
ance. 

When viewed as a whole the genome encodes not only 
proteins that directly influence its reproductive or sur- 
vival ability, but also the copying fidelity with which the 
genome reproduces. For example some viroids contain 
genes that are translated into surface coat proteins while 
others genes code for the replicase enzymes that perform 
the copying of its genetic material. In more complex or- 
ganisms additional genes may encode for modifiers of the 
accuracy of copy and repair enzymes, usually increas- 
ing mutation rates (6]^, but sometimes resulting in a 
decrease jlO| . These modifiers can have large or small ef- 
fects on mutation rate and affect individual bases or the 
entire genome JTTl-p^t. 

One consequence of this flexibility of mutation rates 
and their encoding is that if there are random changes 
(mutations) in genes determining the mutation rate then 
the copying fidelity will itself undergo Darwinian evolu- 
tion. 



III. POPULATION GENETICS IN CHANGING 
ENVIRONMENTS 

When comparing two haploid genomes, the one with 
lower mutation frequency will produce offspring that are 
on average more closely related to itself. This means that 
for an asexual haploid replicator evolving on a static fit- 
ness landscape the optimal mutation rate for a sequence 
whose fitness is already globally maximal is zero. If the 
fitness peak moves, however, the situation changes: to 
avoid extinction a genome with an initially superior fit- 
ness is forced to accept a non-zero mutation rate to sur- 
vive. This leads to a non-trivial optimal copying fidelity. 

Kimura formalized the evolutionary effect of a chang- 
ing environment by considering the genetic load of a pop- 
ulation H: the proportion by which the population fit- 
ness is decreased in comparison with an optimum geno- 
type. Genetic load results from a number of competing 
factors; most notably the mutational load due to the dele- 
terious effects of most mutations and the segregational 
load due to the temporary reduction in fitness that occurs 
whenever the selective environment changes. Assuming 
that a population minimizes the genetic load, the opti- 
mal mutation rate can be calculated. Using a descrete 
time model, i.e. a model where there is no overlap be- 
tween generations, with one fitness determining locus the 
optimal mutation rate becomes: 

fJ-opt = - (1) 

T 

where r is the number of generations between environ- 
mental changes. This model only considers the effect of 
mutations on the population and is therefore based on 
group selection. 



Later population genetic models that examined com- 
petition between genetic modifiers of the mutation rate 
demonstrated that (for haploids with a single fitness de- 
termining locus) a non-zero mutation rate comes to dom- 
inate a pop ulation evolving in an oscillating environ- 
ment [|L4|-|l7| . These models are not built on group selec- 
tion. However a general and simple to interpret multi- 
locus modifier model does not exist. 



IV. THE MODEL 

We will explore a more general model of the evolution 
of mutation rates in a dynamic environment. Consider 
a population of haploid genomes where a genome con- 
sists of two separated parts, one coding for the fitness 
and one coding for the probability per base /x of an er- 
ror occurring during copying. There is complete linkage 
(no recombination) between the sections of the genome 
that encode the mutation rate and those that determine 
the fitness. We also assume that the fitness determining 
region is of fixed length v. In general we are interested 
in the fates of certain genomes gi which have a (possibly 
time-dependent) fitness advantage o~(t) over all other se- 
quences. We call these genomes master-sequences. The 
genomic copying fidelity of the fitness determining re- 
gion of each strain g t is Qi = (1 — Hi) v , the index i refers 
to the mutation rate of the strain, different strains have 
different mutation rates but identical fitness a. We as- 
sume that mutations do not affect the copying fidelity, 
only the fitness. Changes to the mutation rates occur 
on a time-scale significantly slower than the time it takes 
for the population to reach equilibrium. During a period 
when a specific sequence has superior fitness compared to 
the background (i.e. between environmental shifts) the 
changes in the relative concentrations Xi of the master- 
sequences are described by the replicator equation 

x i (t)=Q i o-(t)x i (t)-f(t)x i (t) (2) 

where f(t) — o~(t)'^2j QjXj(t) normalizes the relative 
concentrations of the master-sequence strains. Mutations 
from background sequences onto the strains with optimal 
fitness are ignored. Since we are only interested in com- 
petition between master-sequences the background is not 
explicitly expressed in these equations. 

The environment changes as follows: for a time t 6 
[0,Ti] one genotype has superior fitness, followed by a 
new gene-sequence for time t £ [t%, t\ + t%\, etc. The no- 
tation is chosen so that r denotes lengths of time inter- 
vails. We assume that the initial concentration of the new 
master-sequences Xi immediately after the shift (at time 
t a = X)i=i T i + e i where m denotes shifts of the fitness- 
peak and e is am infinitely small time-period) are propor- 
tional to the concentrations of the old master-sequence 
before the shift (at tf, = Yl%=i T i ~ e ) 

Xi{t a ) = h(p Ji )x l (t b ) (3) 
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It is reasonable to assume that h(fii) is a function with 
Taylor-expansion in the mutation rate jj. 



Mm) = 



(4) 



where k m is a measure of the environmental change, i.e. 
the number of point mutations needed to transform the 
old superior sequence into the new. This basically means 
that k m is the Hamming distance from the old peak to 
the new at shift m. The constants ocj are combinatorial 
factors. It will turn out that the optimal mutation rate 
is independent of these factors. 

To analyze the long term behavior of this system we 

make a change of variables y%{t) — e^o ^ s ^ ds Xi(t). The 
new system of differential equations is linear and the 
equations are decoupled (due to the assumption that the 
selective dynamics is significantly faster than the changes 
in mutation rate), it is therefore easy to find the analyt- 
ical solution: 



yi(t) = yi (0)e 



h f <J m {s)ds 



(5) 



Since Xi is propotional to y^, maximizing the growth 
of yi and equivalent. After a suitably long time 

interval the population will be completely dominated by 
genomes that have a mutation rate closest to the optimal 
value fj, op t which maximizes the long term growth of the 
strain 
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where (-} m denotes a time average during time-period 
m. Setting the derivative of this expression to zero and 
using Eq. [| we find the optimal copying fidelity to be 
approximately 
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where (■) denotes a time average over all time periods. 
We also assume no correlation between {a) m and T m . 
Since the genome lengths is large v 3> 1, the optimal 
copying fidelity and mutation rate per genome become: 
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Thus we find that the genomic optimal copying fidelity 
is independent of the genome length for fairly general 
types of environmental change in both the advantage of 
the fittest genotype a(t) and the size of environmental 
shifts h(fi). 



V. SIMULATIONS 

To confirm the theoretical derivations we simulated 
the evolution of replicators in continuous time on a mov- 
ing single peaked landscape using a birth-death process. 
Each time unit in the continuous time replicator equation 
is the mean replacement time of the population and could 
therefore be identified as a generation. In the simulation 
each generation is devided into N time-steps (where N 
is the population size). At each of these time-steps a 
single individual is selected to copy and mutate. Individ- 
uals are selected wita h probability proportional to their 
relative fitness, which is given by a or 1 on the single- 
peaked landscape. Thus a master-sequence of strain yi 
(with mutation rate fii) is chosen with probability jyj. 
This copy replaces a randomly chosen individual in the 
existing population which is then discarded. Thus the 
population is replaced one by one in discrete birth-death 
events. In the limit of large population size the dynamics 
of this simulation approaches the continuous time repli- 
cator equation. 

The fitness peak is changed every r generations to one 
of its nearest neighbors. For the binary genomes used 
here it accomplished by flipping a randomly chosen bit 
in the definition of the fitness peak. 

The population was first seeded with a diverse range of 
mutation rates and the population was allowed to evolve 
while these rates were kept fixed. This is a true test 
of fi ap t, since the fastest growing sequence should come 
to dominate. In general the population converged to the 
strain with mutation rate closest to the theoretically pre- 
dicted fi pt- Figure [I] shows the mean mutation rate of 
the population p, evolving down towards the theoretically 
predicted optimum /i opt w — 0.00445. From about 
generation 800 the variance in mutation rates in the pop- 
ulation is larger than the fluctuations in the mean and 
the evolution of rates has effectively ended. 

Simulations were also made to study the effects of more 
rapidly changing mutator dynamics. In these simulations 
errors in the copying process not only introduce changes 
in the fitness determining genotype, but also result in 
offspring with slightly different mutation rates than their 
parents, i.e. the mutation rate is allowed to evolve. The 
mutation rate was treated as a continuous variable which 
had Gaussian noise introduced during the copying pro- 
cess. 
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VI. BIOLOGICAL IMPLICATIONS 
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FIG. 1. Mean mutation rate evolving towards the optimal 
rate of fj, op t = 0.00445. Error bars are one standard deviation 
about the mean, a = 5, r = 2, v = 25, N = 10 4 

Fig. U shows the evolution of mutation rates in detail 
in a population with a reasonably fast rate of change of 
mutation rates. This simulation has the same landscape 
parameters as Fig. [l| The mean mutation rate fluctuates 
around the optimum. For mutation rates close to the 
optimum fluctuations in selection are significantly larger 
than the selective advantages of one mutation rate over 
another. In this region the evolution of mutation rates 
is effectively neutral and thus the mean mutation rate 
conducts a random walk about the optimum. We also 
note that the population typically spends more time with 
mutation rates above the optimum than below. This is 
mainly a finite population size effect. When the peak 
moves and the population size is limited there is a rela- 
tively large probability that there will be no individuals 
representing a master-sequence with very low mutation 
rate on the new peak. This leads to a temporary increase 
in mutation rate in the population after an environmental 
shift. 
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FIG. 2. Evolution of mutation rates of mutationally diverse 
population. fj, opt = 4.45 x 10~ 3 , a - 5, r = 2, v = 25, N = 10 4 



In nature the existence, and value, of an optimum mu- 
tation rate that results from a changing environment de- 
pends on many different parameters: the time between 
shifts in the selective environment, the complex struc- 
ture of the fitness-landscape, the genome length, co- 
evolutionary effects, the strength of selection, neutral- 
ity in the fitness landscape and fluctuations due to finite 
population sizes etc. One must therefore be careful when 
comparing the results of a simple model, such as the one 
we have presented in this paper, and biological measure- 
ments. Nonetheless it is this range of possible differences 
between organisms and the complexity of their evolution- 
ary environments that leads us to consider the possibility 
that simple laws of biology — such as the scaling of point 
mutation rates with genome length — are likely to have 
quite simple explanations that do not depend on the de- 
tails of the particular organism. It is therefore worth 
comparing the results of the model presented in this pa- 
per with the biological data. 
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FIG. 3. The shaded region shows the genomic mutation 
rates for DNA based organisms listed in Table For low 
average fitness advantage a the mutation rate is relatively 
insensitive to the frequency of changes in the environment. 
For clarity we have assumed (k) = f in this figure. 

For low mutation rates Eq. || is relatively insensitive 
to changes in the average fitness or size and frequency of 
environmental changes, as shownin Fig. ||. This insen- 
sitivity of the optimal genomic mutation rates to evolu- 
tionary parameters is important, since the bacteria and 
phages illustrated in table | are most unlikely to live in 
environments with the same types of time-dynamics and 
time-scales. In Fig. ||we see that the sensitivity to one of 
the parameters in the model, a or r, depends strongly on 
in which region the other parameter is. For most realistiv 
populations we may expect the selective advantage a to 
be weak, maybe on average less than 2. The predicted 
mutation rate is then highly insensitive to the average 
time between shifts in the fitness landscape, e.g. a = 2 
gives r E [110,200] for the organisms listed in Table Q. 
It is also reasonable to assume the fitness landscapes of 
the organisms listed in Table | to be more similar to each 
other than to higher eukaryotes and since our predictions 
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as to Q pt are rather insensitive to the details of cr(t), r 
and h(/j,) we would expect many organisms to have ap- 
proximately the same mutation rate per genome (within 
an order of magnitude) . This is what we observe for sim- 
ple DNA-based organisms. 

VII. RNA VIRUSES 

The lytic RNA viruses consistently show an extremely 
high mutation rate — orders of magnitude larger than 
that of any DNA viruses of similar size. This rate of 
around one substitution per genome per generation is 
inconsistent with the analysis conducted above for muta- 
tion rates evolving in a changing selective environment. 
Such high rates imply implausible values for the dynamic 
environment parameters. 

As an explanation for the high mutation rates ob- 
served in many RNA viruses and the mutation rate scal- 
ing with genome length it has been suggested that these 
viruses have evolved the highest mutation rate possible 
to be able to adapt to a rapidly changing environment. 
The maximal mutation rate is then given by the error- 
threshold, which was first discussed in a model by Eigen 
et al. OL It basically states that on a singled peaked 
fitness landscape an organism must have high enough 
copying fidelity so that its relative superiority in repro- 
duction rate multiplied by the probability of reproduc- 
ing onto a perfect copy of itself must be larger than one, 
otherwise there will be no effective selection for the geno- 
type. It has later been shown that the error-threshold 
can rather easily be generalized to include effects of a 
dynamic environment Jl9| . From this argument it is how- 
ever not clear why RNA viruses should evolve towards the 
error-threshold while DNA based organism tend to have 
much lower mutation rates (by orders of magnitude). In 
this section we will combine the error-threshold with the 
model presented in this paper to suggest a possible ex- 
planation to the difference in observed mutation rates 
between DNA and RNA based organisms. 

The dynamic environment model presented in this pa- 
per applies to organisms where the copying fidelity is en- 
coded in a part of the genome that has little or no effect 
on fitness. In many viruses this may not be appropri- 
ate, partly because the proteins involved in mutagenesis 
may have a multitude of functions but also because the 
relatively high selective pressure towards short genome 
lengths will result in the overlap and multiple use of ge- 
netic material where possible. This give rise to a differ- 
ent possibility for the evolution of optimal mutation rates 
and might help explain the large differences between the 
observations for RNA and DNA based organisms. 

We suggest that for organisms which have strong over- 
laps between genes coding for the mutation rate and 
genes coding more directly for reproductive advantage 
there is no effective selection for lower mutation rates, as 
long as the mutation rate is below the error threshold. 



This argument is based on the assumption that most 
mutations are deleterious in terms of fitness, and that 
the relative fitness advantage on the local peak results 
in stronger selection pressure than the pressure towards 
lower mutation rates. We also assume that evolution 
of mutation rates usually affect regions of the genome 
where the organism need mutations to be able to adapt 
ot changes in the environment. If these assumptions ap- 
ply we expect a population to have mutation rates close 
to the error-threshold. Changes to mutation rate is tran- 
sient, assuming that the organism is not pushed beyond 
the error-threshold. 

For this hypotheses to apply, viruses with high mu- 
tation rate (mainly RNA viruses) should have overlap- 
ping genes regulating mutation frequency as well as re- 
production rate, whereas organisms with low mutation 
rates (such as those listed in Table should not have 
overlapping reading frames in their genomes. There are 
observations that support this, but it is unclear whether 
the correlation is strong enough for this hypothesis to be 
valid. 



VIII. CONCLUSIONS 

In this paper we have studied the evolution of muta- 
tion rates in a population of multi locus genomes. The 
genomic mutation rate fj,c leading to the greatest long 
term growth of a strain (the optimal rate) was analyti- 
cally determined for reasonably general peak shifts and 
time-dependent replication rates a(t) 



where (k) is the mean Hamming distance between suc- 
cessive fitness optima and (r) is the mean time between 
shifts. These optimal rates were quantitatively confirmed 
by computational simulations of populations whose mu- 
tation rates were allowed to evolve. 

These continuous time multi-locus replicator models 
predict the kind of scaling of point-mutation rate with 
genome length that has been observed in some bac- 
teria and viruses/phages and puzzled over for years. 
When combined with the consequences of the multi- 
ple use/pleiotropic encoding of copying machinery these 
models of the evolution of mutation rate in dynamic en- 
vironments also suggest why lytic RNA viruses may have 
rates at or about the error-threshold. 
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